69 research outputs found
Recommended from our members
An Automated Bayesian Framework for Integrative Gene Expression Analysis and Predictive Medicine
Motivation: This work constructs a closed loop Bayesian Network framework for predictive medicine via integrative analysis of publicly available gene expression findings pertaining to various diseases. Results: An automated pipeline was successfully constructed. Integrative models were made based on gene expression data obtained from GEO experiments relating to four different diseases using Bayesian statistical methods. Many of these models demonstrated a high level of accuracy and predictive ability. The approach described in this paper can be applied to any complex disorder and can include any number and type of genome-scale studies
Recommended from our members
Automated Synthesis and Visualization of a Chemotherapy Treatment Regimen Network
Cytotoxic treatments for cancer remain highly toxic, expensive, and variably efficacious. Many chemotherapy regimens are never directly compared in randomized clinical trials (RCTs); as a result, the vast majority of guideline recommendations are ultimately derived from human expert opinion. We introduce an automated network meta-analytic approach to this clinical problem, with nodes representing regimens and edges direct comparison via RCT(s). A chemotherapy regimen network is visualized for the primary treatment of chronic myelogenous leukemia (CML). Node and edge color, size, and opacity are all utilized to provide additional information about the quality and strength of the depicted evidence. Historical versions of the network are also created. With this approach, we were able to compactly compare the results of 17 CML regimens involving RCTs of 9700 patients, representing the accumulation of 45 years of evidence. Our results closely parallel the recommendations issued by a professional guidelines organization, the National Comprehensive Cancer Network (NCCN). This approach offers a novel method for interpreting complex clinical data, with potential implications for future objective guideline development
Recommended from our members
A Bayesian Translational Framework for Knowledge Propagation, Discovery, and Integration Under Specific Contexts
The immense corpus of biomedical literature existing today poses challenges in information search and integration. Many links between pieces of knowledge occur or are significant only under certain contexts—rather than under the entire corpus. This study proposes using networks of ontology concepts, linked based on their co-occurrences in annotations of abstracts of biomedical literature and descriptions of experiments, to draw conclusions based on context-specific queries and to better integrate existing knowledge. In particular, a Bayesian network framework is constructed to allow for the linking of related terms from two biomedical ontologies under the queried context concept. Edges in such a Bayesian network allow associations between biomedical concepts to be quantified and inference to be made about the existence of some concepts given prior information about others. This approach could potentially be a powerful inferential tool for context-specific queries, applicable to ontologies in other fields as well
Temporal characterization of patient state with applications to prediction of tachycardia in anesthesia via induction of inhaled desflurane
Thesis (S.M.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2001.This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.Includes bibliographical references (leaves 64-65).It has always been assumed that using clinically measurable parameters is the most efficient way to characterize patient state. By adding additional sensors, monitors, and derived statistics (e.g. mean arterial blood pressure from diastolic and systolic), it was hoped that more information could be garnered about patient state. This thesis challenges the assumption that providing the physician with a full set of clinically measurable parameters is the most efficient way to characterize patient state. The thesis presents a novel way to consider patient state by utilizing reduced dimensionality and by estimating noise. It then explores an application, namely prediction of tachycardia, which often occurs at the onset of induction of inhaled desflurane. One unexpected initial finding was that all 46 patients exhibited tachycardia or hypertension within the first hour of the operation. Three models for predicting tachycardia episodes are proposed, including one model based on use of Blind Noise Adjusted Principal Component Analysis1 (using Iterative Order and Noise Estimate (ION)2 and Principal Component Analysis (PCA)3). Without ION, PCA-based methods alone yielded only 2 useful degrees of freedom, with the rest being relegated to noise. The ION PCA-based method allows one to capture with 5 principal components the information contained in 31 fundamental and derived patient variables, while at the same time reducing the effects of noise. Furthermore, the five discovered significant principal components representing patient state were characterized quantitatively and their physiologic correlates are hypothesized qualitatively. Examination of the 31 original patient parameters in the ION PCA model that predicts tachycardia revealed the relative importance of the original patient parameters to the tachycardia problem. The receiver operating characteristic (ROC) curve for the ION PCA-based predictor suggested a 70% detection rate with 3% false alarms when predicting tachycardia two minutes and twenty seconds into the future. While the patient state characterization method was used for tachycardia prediction, it is potentially useful in myriad medical domains involving multivariate analysis.by Gil Alterovitz.S.M
A Bayesian framework for statistical signal processing and knowledge discovery in proteomic engineering
Thesis (Ph. D.)--Harvard-MIT Division of Health Sciences and Technology, February 2006.Includes bibliographical references (leaves 73-85).Proteomics has been revolutionized in the last couple of years through integration of new mass spectrometry technologies such as -Enhanced Laser Desorption/Ionization (SELDI) mass spectrometry. As data is generated in an increasingly rapid and automated manner, novel and application-specific computational methods will be needed to deal with all of this information. This work seeks to develop a Bayesian framework in mass-based proteomics for protein identification. Using the Bayesian framework in a statistical signal processing manner, mass spectrometry data is filtered and analyzed in order to estimate protein identity. This is done by a multi-stage process which compares probabilistic networks generated from mass spectrometry-based data with a mass-based network of protein interactions. In addition, such models can provide insight on features of existing models by identifying relevant proteins. This work finds that the search space of potential proteins can be reduced such that simple antibody-based tests can be used to validate protein identity. This is done with real proteins as a proof of concept. Regarding protein interaction networks, the largest human protein interaction meta-database was created as part of this project, containing over 162,000 interactions. A further contribution is the implementation of the massome network database of mass-based interactions- which is used in the protein identification process.(cont.) This network is explored in terms potential usefulness for protein identification. The framework provides an approach to a number of core issues in proteomics. Besides providing these tools, it yields a novel way to approach statistical signal processing problems in this domain in a way that can be adapted as proteomics-based technologies mature.by Gil Alterovitz.Ph.D
Recommended from our members
Robust Prediction-Based Analysis for Genome-Wide Association and Expression Studies
Here we describe a prediction-based framework to analyze omic data and generate models for both disease diagnosis and identification of cellular pathways which are significant in complex diseases. Our framework differs from previous analysis in its use of underlying biology (cellular pathways/gene-sets) to produce predictive feature-disease models. In our study of alcoholism, lung cancer, and schizophrenia, we demonstrate the framework’s ability to robustly analyze omic data of multiple types and sources, identify significant features sets, and produce accurate predictive models
Gene expression prediction using low-rank matrix completion
Background: An exponential growth of high-throughput biological information and data has occurred in the past decade, supported by technologies, such as microarrays and RNA-Seq. Most data generated using such methods are used to encode large amounts of rich information, and determine diagnostic and prognostic biomarkers. Although data storage costs have reduced, process of capturing data using aforementioned technologies is still expensive. Moreover, the time required for the assay, from sample preparation to raw value measurement is excessive (in the order of days). There is an opportunity to reduce both the cost and time for generating such expression datasets. Results: We propose a framework in which complete gene expression values can be reliably predicted in-silico from partial measurements. This is achieved by modelling expression data as a low-rank matrix and then applying recently discovered techniques of matrix completion by using nonlinear convex optimisation. We evaluated prediction of gene expression data based on 133 studies, sourced from a combined total of 10,921 samples. It is shown that such datasets can be constructed with a low relative error even at high missing value rates (>50 %), and that such predicted datasets can be reliably used as surrogates for further analysis. Conclusion: This method has potentially far-reaching applications including how bio-medical data is sourced and generated, and transcriptomic prediction by optimisation. We show that gene expression data can be computationally constructed, thereby potentially reducing the costs of gene expression profiling. In conclusion, this method shows great promise of opening new avenues in research on low-rank matrix completion in biological sciences. Electronic supplementary material The online version of this article (doi:10.1186/s12859-016-1106-6) contains supplementary material, which is available to authorized users
The GA4GH Variation Representation Specification (VRS): a Computational Framework for the Precise Representation and Federated Identification of Molecular Variation
Maximizing the personal, public, research, and clinical value of genomic information will require that clinicians, researchers, and testing laboratories exchange genetic variation data reliably. Developed by a partnership among national information resource providers, public initiatives, and diagnostic testing laboratories under the auspices of the Global Alliance for Genomics and Health (GA4GH), the Variation Representation Specification (VRS, pronounced “verse”) is an extensible framework for the semantically precise and computable representation of variation that complements contemporary human-readable and flat file standards for variation representation. VRS objects are designed to be semantically precise representations of variation, and leverage this design to enable unique, federated identification of molecular variation. We describe the components of this framework, including the terminology and information model, schema, data sharing conventions, and a reference implementation, each of which is intended to be broadly useful and freely available for community use. The specification, documentation, examples, and community links are available at https://vrs.ga4gh.org/
- …